[Topi, x86] Using MKL blas for quantized dense #6115

anijain2305 · 2020-07-22T20:01:13Z

Using MKL for quantized dense, following the MKL fallback for FP32 dense.

On C5.12x large cascade lake with VNNI support, results for BERT base are as follows (latency in ms)

Type	Sequence length	MXNet+MKLDNN	TVM Alone	TVM+MKL
FP32	128	33.56	N/A	16.83
Quantized	128	23.94697	77.16	11.36

@icemelon9 @eric-haibin-lin @shoubhik

TVM Alone has bad performance because we don't have a good integer dense schedule.

eric-haibin-lin

Is the MXNet+MKLDNN baseline also in int8?

anijain2305 · 2020-07-23T05:00:48Z

@eric-haibin-lin Yes, the MXNet+MKLDNN baseline is also in int8.

TaoLv · 2020-07-23T14:07:41Z

Better to show the performance of TVM before using MKL s8u8s32 GEMM.

anijain2305 · 2020-07-23T17:17:10Z

@TaoLv Good point, I added the latency numbers for TVM alone. Thanks for pointing it out!

anijain2305 · 2020-07-24T16:48:55Z

@icemelon9 Can you please manage this PR?

anijain2305 · 2020-07-28T06:38:40Z

Ping @icemelon9

tqchen · 2020-07-28T16:04:47Z

While it is OK to make use of the mkldnn in this case, we should always work hard to get good integer schedules and learn from the insights, just as the case we did for the CUDA softmax and other cases.

anijain2305 · 2020-07-28T17:01:48Z

@tqchen Agreed. For now, my reasoning was to just extend MKL to int8. But I agree that it will be better to focus on TVM schedules. This one will require more work as even FP32 schedules for dense are not well optimized.

icemelon

LGTM

icemelon · 2020-07-28T20:32:32Z

Thanks @anijain2305 @eric-haibin-lin @TaoLv @tqchen

* [Topi, x86] Using MKL blas for quantized dense * Typo * CBLAS_OFFSET only available for MKL * Skipping tests as GPU CI uses Openblas * Retrigger Co-authored-by: Ubuntu <ubuntu@ip-172-31-0-202.us-west-2.compute.internal>

Ubuntu added 2 commits July 22, 2020 19:57

[Topi, x86] Using MKL blas for quantized dense

73d7c24

Typo

82bcf0c

anijain2305 force-pushed the q_bert_mkl branch from 0ff6b4d to 5a4f0c9 Compare July 23, 2020 02:36

CBLAS_OFFSET only available for MKL

8fd0b17

anijain2305 force-pushed the q_bert_mkl branch from 5a4f0c9 to 8fd0b17 Compare July 23, 2020 02:37

eric-haibin-lin reviewed Jul 23, 2020

View reviewed changes

Skipping tests as GPU CI uses Openblas

a255360

Retrigger

d3e990c

junrushao approved these changes Jul 26, 2020

View reviewed changes

icemelon approved these changes Jul 28, 2020

View reviewed changes

icemelon merged commit 8cd53e0 into apache:master Jul 28, 2020

icemelon added the status: accepted label Jul 28, 2020

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Topi, x86] Using MKL blas for quantized dense #6115

[Topi, x86] Using MKL blas for quantized dense #6115

anijain2305 commented Jul 22, 2020 •

edited

Loading

eric-haibin-lin left a comment

anijain2305 commented Jul 23, 2020

TaoLv commented Jul 23, 2020

anijain2305 commented Jul 23, 2020

anijain2305 commented Jul 24, 2020

anijain2305 commented Jul 28, 2020

tqchen commented Jul 28, 2020

anijain2305 commented Jul 28, 2020

icemelon left a comment

icemelon commented Jul 28, 2020

[Topi, x86] Using MKL blas for quantized dense #6115

[Topi, x86] Using MKL blas for quantized dense #6115

Conversation

anijain2305 commented Jul 22, 2020 • edited Loading

eric-haibin-lin left a comment

Choose a reason for hiding this comment

anijain2305 commented Jul 23, 2020

TaoLv commented Jul 23, 2020

anijain2305 commented Jul 23, 2020

anijain2305 commented Jul 24, 2020

anijain2305 commented Jul 28, 2020

tqchen commented Jul 28, 2020

anijain2305 commented Jul 28, 2020

icemelon left a comment

Choose a reason for hiding this comment

icemelon commented Jul 28, 2020

anijain2305 commented Jul 22, 2020 •

edited

Loading